A Practical Video Database Based on Language and Image Analysis

نویسندگان

  • Yiqing Liang
  • Bede Liu
چکیده

We integrated a practical digital video database system based on language and image analysis with components from digital video processing, still image search, information retrieval, dosed captioning processing. The attempt is to utilize the multiple modalities of information in video and implement data fusion among the multiple modalities. Keyframes are extracted to represent shots based on video content. Logical structure of video is discovered and represented as Scene Transition Diagram. Storyboard and STG, and catalog information provide means to browse video content non-linearly. Text information from dosed captioning is extracted and associated with shots. Text information search and image similarity search are provided to get direct access to video shots. We integrated a practical digital video database system based on language and image analysis with components from digital video processing, still image search, information retrieval, closed captioning processing, and Intemet and Web page programming. The research and development project currently has digitized video programs of total 13 hours and 50 minutes long, among which one hour and 5 minutes are stored at Advance Inc. and 12.75 hours are stored at Princeton University. These MPEG video stories take up about 8.4 GB disk space. The contents are focused on two areas: US military operation in Bosnia and US presidential politics. 1. Multiple Modality from Video Information One key feature of our research is data fusion (i.e., making use of all types of existing information which coexists in video information). What is needed is a system that can utilize multiple tracks of information embedded in digital video in order to obtain retrieval terms that characterize the situation shown in the video. This should include audio, video, closed captions, caption, sound, music, special effects, catalog information, and transcripts. Audio, text, sound track, and image analysis should work hand-in-hand to disambiguate the terms, with imagery identifying speakers or objects, with language analysis providing keys needed for image analysis, and the sound track (i.e., music and special effects) assisting further interpretation of the contents. Text from Speech Recognition Extracting transcripts from audio track through speech recognition was the first task examined. Our intention is to study the application of existing speech recognition algorithms for the extraction of words from the audio track. The development of new speech recognition algorithms, although not the goal of our research, was examined for the purpose of extracting keywords as part of the data fusion effort. The intent of this research is to discover an appropriate Speech Recognition system capable of performing dictation and transformation of human speech from the audio portion of digitized video data into text for use in text inquiries. A Speech Recognition engine must be capable of the following: Speaker-independent: requirement of preparing various individuals in a video must be a bare minimum. The system should be adaptive in order to increase recognition accuracy. ̄ Continuous speech recognition: system must be able to perform recognition of continuous natural human speech. Discrete speech recognition requires short silences to separate individual words, thereby performing poorly on ordinary human speech. Speaker and vocabulary adaptation is desirous. Unrestricted dictation: restriction in term of noiselevel, domain-specifics, or vocabulary-size are not acceptable in the application. Many commercial packages for speech recognition were reviewed including ones in the research effort. Special attention was paid to packages capable of reading from MPEG files in order to recognize speeches in view of our plan to use MPEG as the digital video standard and the fact that MPEG files for video and audio can be generated at digitization time. None of the reviewed packages including IBM VoiceType Dictation, Microsoft Whisper, PowerSecretary from Articulate Systems, Inc., AT&T WATSON Advanced Speech Applications Platform, DragonDictate Dragon System, Philips SpeechMagic can 127 From: AAAI Technical Report SS-97-03. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A PRACTICAL APPROACH TO REAL-TIME DYNAMIC BACKGROUND GENERATION BASED ON A TEMPORAL MEDIAN FILTER

In many computer vision applications, segmenting and extraction of moving objects in video sequences is an essential task. Background subtraction, by which each input image is subtracted from the reference image, has often been used for this purpose. In this paper, we offer a novel background-subtraction technique for real-time dynamic background generation using color images that are taken fro...

متن کامل

A new classification method based on pairwise SVM for facial age estimation

This paper presents a practical algorithm for facial age estimation from frontal face image. Facial age estimation generally comprises two key steps including age image representation and age estimation. The anthropometric model used in this study includes computation of eighteen craniofacial ratios and a new accurate skin wrinkles analysis in the first step and a pairwise binary support vector...

متن کامل

Video-based face recognition in color space by graph-based discriminant analysis

Video-based face recognition has attracted significant attention in many applications such as media technology, network security, human-machine interfaces, and automatic access control system in the past decade. The usual way for face recognition is based upon the grayscale image produced by combining the three color component images. In this work, we consider grayscale image as well as color s...

متن کامل

SIDF: A Novel Framework for Accurate Surgical Instrument Detection in Laparoscopic Video Frames

Background and Objectives: Identification of surgical instruments in laparoscopic video images has several biomedical applications. While several methods have been proposed for accurate detection of surgical instruments, the accuracy of these methods is still challenged high complexity of the laparoscopic video images. This paper introduces a Surgical Instrument Detection Framework (SIDF) for a...

متن کامل

Comparison of Video-Based Instruction and Instructor Demonstration on Learning of Practical Skills in Nursing Students

Introduction: Since technology has an important role in the improvement of educational quality, finding better methods of teaching and learning and improving equipment and teaching materials is emphasized. Regarding this, two educational methods- presentation by the instructor and video presentation, were offered and their effectiveness on nursing students’ learning skills was compared. Method...

متن کامل

Two Novel Chaos-Based Algorithms for Image and Video Watermarking

In this paper we introduce two innovative image and video watermarking algorithms. The paper’s main emphasis is on the use of chaotic maps to boost the algorithms’ security and resistance against attacks. By encrypting the watermark information in a one dimensional chaotic map, we make the extraction of watermark for potential attackers very hard. In another approach, we select embedding po...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002